knowlesys Blue Whale Web Data Extraction System(BWDES)

Released on: May 27, 2008, 11:42 pm

Press Release Author: knowlesys

Industry: Computers

Press Release Summary: The web is an ocean of information containing more than 10
billion web pages, wherein 90% of them are in non-structured or semi-structured
formats. At present, it is expanding with an increasing rate of 1 million pages per
day. The information is increasing at an explosive speed while people's time and
energy are limited. The information absolutely valuable for enterprises or
individuals is just lying in this worldwide ocean of the Internet, and how to
extract them has become one of the most imperative tasks confronting the research
institutions that are engaging the important topics of Information Retrieval, Data
Mining, Knowledge Management and Competitive Intelligence etc.

Press Release Body: Overview
The web is an ocean of information containing more than 10 billion web pages,
wherein 90% of them are in non-structured or semi-structured formats. At present, it
is expanding with an increasing rate of 1 million pages per day. The information is
increasing at an explosive speed while people's time and energy are limited. The
information absolutely valuable for enterprises or individuals is just lying in this
worldwide ocean of the Internet, and how to extract them has become one of the most
imperative tasks confronting the research institutions that are engaging the
important topics of Information Retrieval, Data Mining, Knowledge Management and
Competitive Intelligence etc.
The Blue Whale Web Data Extraction System(BWDES) is like a huge blue whale who
cruises in this information ocean everyday and is capable of automatically and
accurately extracting valuable information for you from the webpage ocean wherein a
multitudes of useless messages (such as page headers and footers, column listings
and advertisement messages) shall be excluded.
In more than three year's time, the Knowlesys Software, Inc. had developed the BWDES
- a powerful web information extraction system. It has a stratified structure and a
loosely coupled module design comprising many sub-systems. The BWDES can extract
designated information in big volume from the web, and integrate them into specified
relational databases, thus to help customers to excavate precious stones from the
Internet minefield. Since the process converses the information from the
semi-structural form into the structural form, from their dispersed state to the
concentrated state, and changes them from the remotely existed information to your
locally hoarded treasure, as well as from the visual file into the digital record,
you can surely extensively use them in the future.
The BWDES is capable of doing data extraction from various types of websites. In
addition to extracting field data of semi-structured construction, it can also
extract some free text information like e-mail addresses and many types of
multimedia files.

The BWDES is characterized as a stable running, intelligent crawling and accurate
extracting software. The BWDES is an information extraction platform. When new
extraction task is required, it is necessary to use this platform to configure the
new web crawling and extraction script and parameters.
A general database access layer is developed in the BWDES that enables its back end
connect to any relational database, such as MS SQL Server, Oracle, DB2, Sybase,
MySQL and InterBase etc, even those file database like the Access database.
Regardless which type the database is, the extracted data can be checked with a
general database browser, as well as export them into various formats such as XML,
CVS, HTML, Excel and so on.
Where it is used
Acquiring key information: Obtain all kinds of professional database
Competitive information system: Monitor through key words the marketing information
of your adversaries who compete with you on the Internet media.
Enterprise content management: Accurately acquire outside content in batches and
dispose them automatically.
Database marketing: Extract comment and contact messages of potential customers from
message books, forums and newsgroups.
Comparison system: The commodity pricing comparison system.
Enterprise Integration Portal: Embed real-time contents of external websites into
your EIP interface.
Integration of Internet information: Put together the information extracted from the
same category websites such as personal resume, employment message, lease and rent
message, commodity message and company directory etc.
Personal information agent: Integration of up-to-date information from various
websites in which individuals or enterprises might be interested, and provide them
to users through E-mail or just pasting them on your webpages, thus to save the time
iof browsing and downloading.
For more information, please visit our website: http://www.knowlesys.com



Web Site: http://www.knowlesys.com

Contact Details: Email Address: lswebdataminer@Gmail.com
Phone: 86-755-86032826
Fox: 86-755-86032828
Company Name:knowlesys
Address:
City:shenzhen
Website URL: www.knowlesys.com
Zip:518000

  • Printer Friendly Format
  • Back to previous page...
  • Back to home page...
  • Submit your press releases...
  •